Finding the Number of Clusters in Unlabelled Datasets Using Extended Cluster Count Extraction (ECCE)
نویسندگان
چکیده
Clustering analysis is the task of partitioning a set of objects O = {O1... On} into C self-similar subsets based on available data. In general, clustering of unlabeled data poses three major problems: 1) Assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into C meaningful groups, and 3) Validating the c clusters that are discovered. All clustering algorithms ultimately rely on one or more human inputs, and the most important input is number of clusters (C) to seek. There are many pre and post clustering methods which relieves the user from this choice. These methods ultimately make the choice by thresholding some value in the code. Thus, the choice of c is transferred to the equivalent choice of the hidden threshold that determines C "automatically". In contrast, tendency assessment attempts to estimate c before clustering occurs. Here, we represent the structure of the unlabeled data sets as a Reordered Dissimilarity Image (RDI) where pair wise dissimilarity information about a data set including ‘n’ objects is represented as n x n image. RDI is generated using VAT (Visual Assessment of Cluster tendency), which highlights potential clusters as a set of “dark blocks” along the diagonal of the image, so that number of clusters can be easily estimated using the number of dark blocks across the diagonal. We develop a new method called “Extended Cluster Count Extraction (ECCE) for counting the number of clusters formed along the diagonal of the RDI. General Terms: Data Mining, Image Processing, Artificial Intelligence.
منابع مشابه
A Comparative study of Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count Extraction
One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data prior to clustering. In this paper, we implement a new method for determining the number of clusters called Extended Dark Block Extraction (EDBE), which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set. Its basic steps include 1) Generatin...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملFinding the Number of Clusters in Unlabeled Datasets using Extended Dark Block Extraction
Clustering analysis is the problem of partitioning a set of objects O = {o1... on} into c self-similar subsets based on available data. In general, clustering of unlabeled data poses three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into c meaningful groups, and 3) validating the c clusters that are discovered. We address the first pr...
متن کاملارائه یک الگو ترکیبی داده کاوی با استفاده از قواعد انجمنی و خوشه بندی برای تعیین استراتژی تخفیف دهی، مطالعه موردی شرکت پخش پگاه
Sales promotion is important issue in most of sales and distribution companies and finding the most appropriate strategy for this subject is marketers’ challenge. Discounting (offering) is one of sales promotion strategies. Using the fixed and constant discounting strategy for all customers and on all goods reduces chance for success. Discounting strategy needs a model for providing best ...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کامل